Technical Concepts > About Asynchronous Protocol

About Asynchronous Protocol

As of PTV xServer 1.18 there is now an alternative protocol for potentially long running requests. Instead of single HTTP requests that stay idle until a response is being sent, the initial request will start a background job running on the server. Clients have to query the job status to see when it has finished, then ask for the result. The server will store the result until it has been fetched.

A job is the representation of an asynchronous transaction. It has a status which may include further detailed progress information. When finished, a job also has a result which may be an error message.

Jobs are kept persistent on the server for a certain period.

Benefits

These are the benefits of using asynchronous protocol.

Interruptions of the HTTP connection will no longer lose the result in the process, and transactions are much more reliable.
Some middleware proxies will forcibly terminate connections that are idle for too long. This problem vanishes when using the asynchronous protocol.
There are convenient operations that can report status and detailed progress information.
There are convenient operations that can stop or delete jobs. Results of stopped jobs can still be fetched.

Prerequisites

In order to use job requests, the PTV xServer has to persist responses until they are fetched. This is done with the help of a JDBC database and works out of the box for a single PTV xServer.

For a cluster of PTV xServer you need a central database. If you need a highly available solution you have to set up replication to your backup systems as well.

You can replace the bundled database with your own.

Installation and Administration Guide

Please refer to the Administator's Guide for general database information.

Using a Custom Database

For job management nearly any JDBC database can be used. It is requisite that the BLOB API is fully implemented by the JDBC interface, which is not the case for PostgreSQL database management system.

The schema of the necessary database table XSERVER_JOBS is defined as follows:

CREATE TABLE "JOB"."XSERVER_JOBS"(
ID varchar(36) PRIMARY KEY NOT NULL,
  XSERVER varchar(18),
  METHOD varchar(50),
  STATUS varchar(20) NOT NULL,
  ELAPSED bigint,
  PROGRESS blob,
  RESULT blob,
  FINISHTIME bigint,
  FETCHTIME bigint,
  LASTUPDATETIME bigint,
  USERID varchar(36)
);
CREATE INDEX XSERVER_JOBS_IDX1 ON "JOB"."XSERVER_JOBS"(STATUS);
CREATE INDEX XSERVER_JOBS_IDX2 ON "JOB"."XSERVER_JOBS"(FINISHTIME);
CREATE INDEX XSERVER_JOBS_IDX3 ON "JOB"."XSERVER_JOBS"(FETCHTIME);
CREATE INDEX XSERVER_JOBS_IDX4 ON "JOB"."XSERVER_JOBS"(LASTUPDATETIME);

Cluster Configuration

In order to set up a cluster of PTV xServer, one central database has to be set up and defined in the PTV xServer configuration files, ideally but not necessarily on a separate dedicated server.

You can designate one of the local databases as the central database, as long as you configure the load balancer properly - your database server will have less CPU power left for requests.

The configuration can be found in job-management-db.xml. Standard configuration of the Apache Derby database for PTV xRoute Server looks like the following

conf/job-management-db.xml
    <bean id="hikariConfigJob" class="com.zaxxer.hikari.HikariConfig">
        <property name="poolName" value="springHikariCPJob" />
        <property name="connectionTestQuery" value="VALUES 1" />
        <property name="connectionTimeout" value="5000" />
        <property name="dataSourceClassName" value="org.apache.derby.jdbc.ClientDataSource" />
        <property name="maximumPoolSize" value="10" />
        <property name="dataSourceProperties">
            <props>
                <prop key="serverName">localhost</prop>
                <prop key="databaseName">job</prop>
                <prop key="user">JOB</prop>
                <prop key="password">JOB</prop>
                <prop key="portNumber">50036</prop>
            </props>
        </property>
    </bean>

Please refer to your database vendor's manual to find out more about automatic replication.

Cleanup

Jobs have to be cleaned up periodically. Each PTV xServer comes with an automatic cleanup mechanism which will remove fetched, unfetched, and zombie jobs.

The job life cycle status (finished, fetched, zombie) is separate from its logical status (waiting, running, succeeded etc.) and only available from the database table, querying FINISHTIME, FETCHTIME and LASTUPDATETIME.

A job is unfetched if it has a defined FINISHTIME but zero FETCHTIME.
A job is fetched if it has a defined FINISHTIME and FETCHTIME.
A job is considered a zombie if it has zero FINISHTIME and the LASTUPDATETIME is too far in the past.

The retention periods for each type of job can be configured in xserver.properties:

jobResultRetentionTime - time to live for unfetched jobs. This should be a generous amount of time (days or weeks) and is meant as a protection against jobs forgotten by the client.
fetchedJobResultRetentionTime - time to live for fetched jobs. This period should not be too long (minutes or hours) to protect the database against excessive growth but still allow a grace period during which clients can retry a download in case of network issues.
zombieJobRetentionTime - time to live for unfinished and no longer updated jobs. This period should be short (hours) and is meant to clean up "zombie" (or "orphaned") jobs in case of crashes of the processing instance.

You can effectively deactivate the automatic cleanup task by setting these properties to a very high value, in case you want to provide your own cleanup mechanism.

Programming Guide

The asynchronous protocol no longer consists of a single HTTP message with a request and response pair that form a transaction but a sequence of such HTTP message exchanges. It is important to understand these messages as well as the underlaying Job object and its statuses.

Name Scheme for Operations

Every long running transaction of the form

runLong(RequestParameters): ResultType

is replaceable with a pair of operations:

startRunLong(RequestParameters): Job
fetchResultType(jobId: string): ResultType

All start operations begin with the prefix start, and closing operations begin with fetch.

In addition, the protocol requires the use of the generic operations

watchJob(jobId: String, watchOptions: WatchOptions): Job
Documentation watchJob
stopJob(jobId: String): Job
Documentation stopJob
deleteJob(jobId: String): Job
Documentation deleteJob

Job Status and Progress

The Job object models meta information about the server job. It contains the following attributes:

id: string - the globally unique id of the job, used to retrieve the job meta data and the results
status: JobStatus - the status of the job
elapsedTime: int - the uptime of the job, useful to interpret the novelty of a job status
progress: JobProgress - the progress information for the job; this is an optional attribute and may be missing. If a progress is available, its concrete type depends on the operation; for instance, bulk operations always return a BulkJobProgress. See the PTV xServer use-case documentation for details. If the request is finished, the progress values represent the state when the request was finished.

All job-related operations except fetch will return a Job object. The status of the job will be QUEUING when returned from the start operation, STOPPING when returned from the stopJob operation, and DELETED when returned from the deleteJob operation. The watchJob operation can return any of the possible status codes:

job status	description	watchable	has progress?	can fetch?	can stop?	can delete?
`QUEUING`	the job has been scheduled for execution	when server is under load and the job has to wait	no	no	no	yes
`RUNNING`	the job is being executed	if the result is not yet available	yes	no	yes	yes
`STOPPING`	the job is being finished prematurely	until the current processing step has finished	yes	no	yes (ignored)	yes
`SUCCEEDED`	the job was successful	after processing has terminated	yes	yes	yes (ignored)	yes (erases)
`FAILED`	the job has failed	after processing has terminated	yes	yes (returns with the exception)	yes (ignored)	yes (erases)
`DELETED`	the job has been deleted	while the job has not yet been terminated and cleaned up	no	no	no	yes (ignored)
`UNKNOWN`	no current job with this id	if id is wrong, or job has already been deleted or fetched	no	no	no	yes (ignored)

The following diagram illustrates all possible status changes.

`Operation watchJob`

The watchJob operation will return immediately if the job has state SUCCEEDED, FAILED, DELETED or UNKNOWN. Otherwise, the operation will wait for a status update. If no update happens before a set timeout (maximumPollingPeriod), watchJob will report the last status and only the run time of the job has changed.

WatchJob can return for progress updates while in state RUNNING. This is useful to inform end users about the progress. To enable this behaviour, specify the millisecond period for the progress updates. WatchJob will wait at least this period for a more up to date progress and report only the latest. If no new progress update arrives during the specified period, watchJob will wait for and return with the next available progress update. This mechanism helps to control server and client overhead by coalescing very frequent progress updates. If no progress waiting period is defined, progress will only be sent as part of status updates.

Operation `fetch`

The fetch operation is only admissible for job status SUCCEEDED or FAILED. In case of FAILED, fetch will report the service exception of the job as result object.

After a fetch of the result object following status SUCCEEDED or FAILED, this object will be erased after a short retention period from the server. This period can be configured within the xserver.properties (default: 5 minutes). If a result object is available but never fetched, it will also expire from the server after a default retention period of 1 week. This period may also be changed within xserver.properties .

Operation `stopJob`

RUNNING jobs can be requested to terminate as soon as possible so that preliminary results can be retrieved using the appropriate fetch method. This can be useful when while watching the job progress the user decides that something does not work at all (for instance, all results of a bulk requests so far were failures), or that the results already suffice (for instance, the current state of a long running optimization is good enough).

stopJob will fail if the job is not RUNNING. For convenience reasons, it is admissible to stop an already STOPPING or a finished job. In these cases, the operation has no effect.

Operation `deleteJob`

The delete operation will attempt to delete a job with the given id. Once the delete has been requested, there is no way to cancel the delete and get the results of the job.

If the job has not yet started, the request will be removed from the server queue. If the job is already finished, deleteJob will erase the results as well; this can be used to clean up results from the server without having to fetch them.

The DELETED state may not be visible for the client if it works quickly; in that case, watchJob will report an UNKNOWN job. For convenience reasons, it is admissible to send a delete to an already DELETED or an UNKNOWN job. In these cases, the operation has no effect.

A Typical Job Transaction

A typical transaction is done like this, as pseudocode:

myjobid = startRunLong(request).id
do
  jobstatus = watchJob(myjobid).status
while jobstatus not in [ FAILED, SUCCEEDED, DELETED, UNKNOWN ]
if jobstatus in [ FAILED, SUCCEEDED ] then
  myresult = fetchResponseType(myjobid)
end

Of course, this transaction is the most simple form. You usually want to do more:

handle error conditions,
automatic retries in case of temporary connectivity issues,
display progress for users (in interactive scenarios),
allow users to stop/delete jobs (in interactive scenarios).

The clients bundled with PTV xServer provide convenience functions for such transactions.